Latent class model with conditional dependency per modes to cluster categorical data
نویسندگان
چکیده
We propose a parsimonious extension of the classical latent class model to cluster categorical data by relaxing the conditional independence assumption. Under this new mixture model, named Conditional Modes Model (CMM), variables are grouped into conditionally independent blocks. Each block follows a parsimonious multinomial distribution where the few free parameters model the probabilities of the most likely levels, while the remaining probability mass is uniformly spread over the other levels of the block. Thus, when the conditional independence assumption holds, this model defines parsimonious versions of the standard latent class model. Moreover, when this assumption is violated, the proposed model brings out the main intra-class dependencies between variables, summarizing thus each class with relatively few characteristic levels. The model selection is carried out by an hybrid MCMC algorithm that does not require preliminary parameter estimation. Then, the maximum likelihood estimation is performed via an EM algorithm only for the best model. The model properties are illustrated on simulated data and on three real data sets by using the associated R package CoModes. The results show that this model allows to reduce biases involved by the conditional independence assumption while providing meaningful parameters.
منابع مشابه
An application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کاملBeyond the number of classes: separating substantive from non-substantive dependence in latent class analysis
Abstract Latent class analysis (LCA) for categorical data is a model-based clustering and classification technique applied in a wide range of fields including the social sciences, machine learning, psychiatry, public health, and epidemiology. Its central assumption is conditional independence of the indicators given the latent class, i.e. “local independence”; violations can appear as model mis...
متن کاملModel-Based Clustering for Conditionally Correlated Categorical Data
An extension of the latent class model is proposed for clustering categorical data by relaxing the classical class conditional independence assumption of variables. In this model, variables are grouped into inter-independent and intra-dependent blocks in order to consider the main intra-class correlations. The dependence between variables grouped into the same block of a class is taken into acc...
متن کاملExtension of K-Modes Algorithm for Generating Clusters Automatically
K-Modes is an eminent algorithm for clustering data set with categorical attributes. This algorithm is famous for its simplicity and speed. The KModes is an extension of the K-Means algorithm for categorical data. Since K-Modes is used for categorical data so ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Modes’ of clusters are used instead of ‘Means’. Ho...
متن کاملMixture of latent trait analyzers for model-based clustering of categorical data
Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspon...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Adv. Data Analysis and Classification
دوره 10 شماره
صفحات -
تاریخ انتشار 2016